Goto

Collaborating Authors

 training graph


A Appendix

Neural Information Processing Systems

A.1 Prototype-based Graph Information Bottleneck - Eq. 4 From Eq. 3, the GIB objective is: min We perform ablation studies to examine the effectiveness of our model (i.e., PGIB and PGIB In Figure 7, the " with all " setting represents our final model that includes all the components. We conduct experiments on graph classification using different readout functions for PGIB. We illustrate the reasoning process on two datasets, i.e., MUT AG and BA2Motif, in Figure 8. PGIB Then, PGIB computes the "points contributed" to predicting each class by multiplying the similarity We have conducted additional qualitative analysis. It is crucial that the prototypes not only contain key structural information from the input graph but also ensure a certain level of diversity since each class is represented by multiple prototypes. Its goal is to make the masked subgraph's prediction as close as possible to the original graph, which helps to detect substructures significant




A Differentiable Logical Operators T-norms (>

Neural Information Processing Systems

Fuzzy operators can be applied to vectors of continuous values within a certain range, e.g., Different fuzzy logics implement different t-norms and t-conorms. NodePiece-QE are reported in Table 13 in Appedix D . We sampled 9 datasets (used in Section 5.2 and Section 5.3) from the original FB15k-237 [ 29 ] with Creation details are provided in the Section 5.1 and statistics on the We use those queries in Section 5.5 to Table 5: Statistics on sampled queries for each dataset ratio and query type. Furthermore, for the experiment in Section 5.3 to measure the abilities of inductive models to find Most queries (except 2i,3i) have new answer sets. Most queries have new answer sets.






WST: Weakly Supervised Transducer for Automatic Speech Recognition

Gao, Dongji, Liao, Chenda, Liu, Changliang, Wiesner, Matthew, Garcia, Leibny Paola, Povey, Daniel, Khudanpur, Sanjeev, Wu, Jian

arXiv.org Artificial Intelligence

The Recurrent Neural Network-Transducer (RNN-T) is widely adopted in end-to-end (E2E) automatic speech recognition (ASR) tasks but depends heavily on large-scale, high-quality annotated data, which are often costly and difficult to obtain. To mitigate this reliance, we propose a Weakly Supervised Transducer (WST), which integrates a flexible training graph designed to robustly handle errors in the transcripts without requiring additional confidence estimation or auxiliary pre-trained models. Empirical evaluations on synthetic and industrial datasets reveal that WST effectively maintains performance even with transcription error rates of up to 70%, consistently outperforming existing Connectionist Temporal Classification (CTC)-based weakly supervised approaches, such as Bypass Temporal Classification (BTC) and Omni-Temporal Classification (OTC). These results demonstrate the practical utility and robustness of WST in realistic ASR settings. The implementation will be publicly available.